Student Solution

-->

"Education is the most powerful weapon which you can use to change the world”
– Nelson Mandela

1 University

1 Course

2 Subjects

Week 8 Final Project

Week 8 Final Project

Q R ? Programming Assignment Development Feedback Remember, the weekly assignments in this course are intentionally complex. All of the skills and knowledge needed to complete the assignments has been woven into the course modules, but I am not teaching directly to the assignments. Instead, I am asking you to pull together the pieces of the puzzle which will help you solve the problem. It is important you DO NOT PROCRASTINATE.?I am here to support you, but timely feedback requires you to work proactively.? At any time during week 8, you may submit screenshots, draft files,?etc. along with?specific questions?related to the development of this assignment. The goal is not for me to "pre-grade" your work, but offer guidance?and point you in the right direction. I encourage you to make use of this opportunity to refine and develop your work. Refer to my?Instructor Introduction and the?Course Syllabus?for the best ways to contact me, in order to get feedback, and expected response times.? ________________________________________ Format and Submission For the purposes of this assignment, use R Studio. Your submission should have 2 files. The first file should be the R Script and it should be saved like "ProjectFinal_LastName_FirstName.R" (notice the .R as the file extension of the script). The second file should be a Microsoft Word file that documents your solutions with screenshots and provides any written responses to the questions below (if the question doesn't specifically request the responses to be within the comments of the code). Both files are mandatory. You don't need individual screenshots for each question, but your screenshots should cover all questions. ________________________________________ Assignment Scenario and Tasks ***NOTE: For each question make sure that your work is not repeating something in the course modules or the textbook. Each response must be original and your own submission. You are strictly prohibited from having another person(s) write, review or edit your solution. Failure to follow this may result in a failing grade. • o ? You were just hired as a Data Scientist in the Biking Sharing Program (BSP) at the Department of Transportation in Washington DC, U.S.A. BSP is more flexible than a traditional biking rental program. The renters can become members and rent bikes in one location and return at another location. There is a GPS attached to each bike. It helps record the duration of travel, departure and arrival, etc. BSP has two years’ data in file day.csv Download day.csv. It contains the following columns: ? instant: record index ? dteday : date ? season : season (1:springer, 2:summer, 3:fall, 4:winter) ? yr : year (0: 2011, 1:2012) ? mnth : month ( 1 to 12) ? hr : hour (0 to 23) ? holiday : whether day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule) ? weekday : day of the week ? workingday : if day is neither weekend nor holiday is 1, otherwise is 0. ? weathersit : ? 1: Clear, Few clouds, Partly cloudy, Partly cloudy ? 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist ? 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds ? 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog ? temp : Normalized temperature in Celsius. The values are divided to 41 (max) ? atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max) ? hum: Normalized humidity. The values are divided to 100 (max) ? windspeed: Normalized wind speed. The values are divided to 67 (max) ? casual: count of casual users ? registered: count of registered users ? cnt: count of total rental bikes including both casual and registered Data Source: https://www.kaggle.com/c/bike-sharing-demand/data (Links to an external site.) You are asked to perform the following tasks by writing a script in R and submit both R codes and a Word document. 1. 1. Load the dataset day.csv Download day.csvinto memory. 2. Perform the following data preparations using control structures: a. Convert numerical season (1,2,3, 4) to characters (springer, summer, fall and winter) b. Convert numerical weathersit (1,2,3,4) to characters (Good, Mist, Bad, Severe) 1. 3. Consider the following predictors, season, holiday, workingday, weathersit, atemp, hum, windspeed, casual and List all categorical variables from this list and convert them to factors. Final Course Project Rubric Final Course Project Rubric Criteria Ratings Pts R Codes and Style view longer description 120 pts Full Marks 96 pts Good R codes are easier to read, share, and verify. There are few comments. There are less than 3 bugs in the script. 72 pts Average R codes could be read. There is no comment. There are less than 4 bugs in the script. 48 pts Below Average R codes are hard to read. There is no comment. There are more than 5 bugs in the script. 24 pts Insufficient R codes are hard to read. There are no comments. There are more than 5 bugs in the script. / 120 pts This criterion is linked to a Learning Outcome Interpretation and Use of Model in the Word Document view longer description 80 pts Excellent The data and model is accurately interpreted to justify the answer, and sufficient data and model is used to defend the main argument. 64 pts Good The data and model is accurately interpreted to justify the answer, and model is used to defend the main argument, but it might not be sufficient. 48 pts Average Data and model is used to defend the main argument, but does not accurately interpret the idea and model, and it might not be sufficient. 32 pts Average Data and model is used to defend the main argument, but it is insufficient. 16 pts Insufficient Data and model is provided, but it is not used to defend the main argument. / 80 pts Total Points: 0 4. Calculate the minimum, maximum, mean, median, standard deviation and three quartiles (25th, 50th and 75th percentiles) of cnt. 5. Calculate the minimum, maximum, mean, median, standard deviation and three quartiles (25th, 50th and 75th percentiles) of registered. 6. Calculate the correlation coefficient of the two variables: registered and cnt. Do they have a strong relationship? 7. Calculate the frequency table of season? What’s the mode of season variable? 8. Calculate the cross table of season and weathersit, then produce proportions by rows and columns respectively. 9. Please plot the histogram and density of the cnt and add the vertical line denoting the mean using ggplot2. 10. Please scatter plot of cnt (y-axis) against registered (x-axis) and add the trend line using ggplot2. 11. Please plot the barplot of season and weathersit on the same barplot using ggplot2 12. Please boxplot cnt (y-axis) against weathersit (x-axis) and save the graph in a file, cntweather.jpg, using ggplot2. Are there any differences in cnt with respect to weathersit? 13. Build the following multiple linear regression models: a. Perform multiple linear regression with cnt as the response and the predictors are: season, weathersit, atemp, and registered. Write down the math formula with numerical coefficients for predictors atemp and registered and skip the coefficients for season and weathersit. b. Preform multiple linear regression with cnt as the response and the predictors are: season, workingday, weathersit, atemp, and registered. Write down the math formula with numerical coefficients for predictors atemp and registered and skip the coefficients for season, workingday and weathersit. c. Preform multiple linear regression with cnt as the response and the predictors are: season, holiday, workingday, weathersit, atemp, hum, windspeed, and registered. Write down the math formula with numerical coefficients for predictors atemp, hum, windspeed, and registered and skip the coefficients for season, holiday, workingday and weathersit. d. Which model do you recommend to the management based on adjusted R squared? Justify your answer. 14. Build the following logistic models: a. forecast holiday using cnt, season, and registered. b. forecast the holiday using cnt, season, weathersit , and registered c. forecast the holiday using cnt, season, weathersit , workingday, and registered d. Which model do you recommend to the management based on McFadden/pseudo R squared to? Justify your answer 15. Summarize Question 13 and Question 14 using R markdown to generate reproducible reports.

View Related Questions

Solution Preview

1. Load the dataset into memory. 2. Perform the following data preparations using control structures: a. Convert numerical season (1,2,3, 4) to characters (springer, summer, fall and winter) b. Convert numerical weathersit (1,2,3,4) to characters (Good, Mist, Bad, Severe) 3. Consider the following predictors, season, holiday, workingday, weathersit, atemp, hum, windspeed, casual and List all categorical variables from this list and convert them to factors.